-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-10003] Improve readability of DAGScheduler #8217
Conversation
logInfo("Submitting " + shuffleStage + " (" + | ||
shuffleStage.rdd + "), which is now runnable") | ||
submitMissingTasks(shuffleStage, jobId) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a potentially controversial change. Let me explain:
Context. This block of code happens when a map stage finishes and all shuffle files are present. What happens here is that the newly runnable stages are submitted and moved from waitingStages
to runningStages
.
Why is it removed? Note that we already call submitWaitingStages()
at the end of this method, which does the exact same thing. It seems to be me that this block of code doesn't do anything extra.
Test coverage. Note that this only affects the case where the map shuffle stage has finished successfully, which is already covered in any existing test in DAGSchedulerSuite
that runs a successful shuffle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks good to me -- I went on a git blame adventure and this code is super old and it's not clear that it was ever necessary, and it definitely doesn't seem necessary now that we have the submitWaitingStages() call at the bottom.
Test build #40932 timed out for PR 8217 at commit |
Test build #1621 has finished for PR 8217 at commit
|
@kayousterhout can you take a look? |
…eadability Conflicts: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
3b9f14f
to
574fb1e
Compare
changeEpoch = true) | ||
|
||
clearCacheLocs() | ||
|
||
// Some tasks had failed; let's resubmit this shuffleStage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why move this outside the if-statement? It looks like it only holds when the if-statement is true, so less confusing I think to have it inside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, makes sense
LGTM except for the one question about the moved comment |
Test build #41983 has finished for PR 8217 at commit
|
Test build #41986 has finished for PR 8217 at commit
|
Note: this is not intended to be in Spark 1.5!
This patch rewrites some code in the
DAGScheduler
to make it more readable. In particular